Goto

Collaborating Authors

 kernel stein discrepancy




Kernel Stein Discrepancy thinning: a theoretical perspective of pathologies and a practical fix with regularization

Neural Information Processing Systems

Stein thinning is a promising algorithm proposed by (Riabiz et al., 2022) for post-processing outputs of Markov chain Monte Carlo (MCMC). The main principle is to greedily minimize the kernelized Stein discrepancy (KSD), which only requires the gradient of the log-target distribution, and is thus well-suited for Bayesian inference. The main advantages of Stein thinning are the automatic remove of the burn-in period, the correction of the bias introduced by recent MCMC algorithms, and the asymptotic properties of convergence towards the target distribution. Nevertheless, Stein thinning suffers from several empirical pathologies, which may result in poor approximations, as observed in the literature. In this article, we conduct a theoretical analysis of these pathologies, to clearly identify the mechanisms at stake, and suggest improved strategies. Then, we introduce the regularized Stein thinning algorithm to alleviate the identified pathologies. Finally, theoretical guarantees and extensive experiments show the high efficiency of the proposed algorithm. An implementation of regularized Stein thinning as the kernax library in python and JAX is available at https://gitlab.com/drti/kernax.



Gradient-Free Kernel Stein Discrepancy Matthew A. Fisher

Neural Information Processing Systems

Stein discrepancies have emerged as a powerful statistical tool, being applied to fundamental statistical problems including parameter inference, goodness-of-fit testing, and sampling. The canonical Stein discrepancies require the derivatives of a statistical model to be computed, and in return provide theoretical guarantees of convergence detection and control. However, for complex statistical models, the stable numerical computation of derivatives can require bespoke algorithmic development and render Stein discrepancies impractical. This paper focuses on posterior approximation using Stein discrepancies, and introduces a collection of non-canonical Stein discrepancies that are gradient-free, meaning that derivatives of the statistical model are not required. Sufficient conditions for convergence detection and control are established, and applications to sampling and variational inference are presented.



Low Stein Discrepancy via Message-Passing Monte Carlo

Kirk, Nathan, Rusch, T. Konstantin, Zech, Jakob, Rus, Daniela

arXiv.org Artificial Intelligence

Message-Passing Monte Carlo (MPMC) was recently introduced as a novel low-discrepancy sampling approach leveraging tools from geometric deep learning. While originally designed for generating uniform point sets, we extend this framework to sample from general multivariate probability distributions with known probability density function. Our proposed method, Stein-Message-Passing Monte Carlo (Stein-MPMC), minimizes a kernelized Stein discrepancy, ensuring improved sample quality. Finally, we show that Stein-MPMC outperforms competing methods, such as Stein Variational Gradient Descent and (greedy) Stein Points, by achieving a lower Stein discrepancy.


Correcting Mode Proportion Bias in Generalized Bayesian Inference via a Weighted Kernel Stein Discrepancy

Afzali, Elham, Muthukumarana, Saman, Wang, Liqun

arXiv.org Machine Learning

Generalized Bayesian Inference (GBI) provides a flexible framework for updating prior distributions using various loss functions instead of the traditional likelihoods, thereby enhancing the model robustness to model misspecification. However, GBI often suffers the problem associated with intractable likelihoods. Kernelized Stein Discrepancy (KSD), as utilized in a recent study, addresses this challenge by relying only on the gradient of the log-likelihood. Despite this innovation, KSD-Bayes suffers from critical pathologies, including insensitivity to well-separated modes in multimodal posteriors. To address this limitation, we propose a weighted KSD method that retains computational efficiency while effectively capturing multimodal structures. Our method improves the GBI framework for handling intractable multimodal posteriors while maintaining key theoretical properties such as posterior consistency and asymptotic normality. Experimental results demonstrate that our method substantially improves mode sensitivity compared to standard KSD-Bayes, while retaining robust performance in unimodal settings and in the presence of outliers.


Kernel Stein Discrepancy thinning: a theoretical perspective of pathologies and a practical fix with regularization

Neural Information Processing Systems

Stein thinning is a promising algorithm proposed by (Riabiz et al., 2022) for post-processing outputs of Markov chain Monte Carlo (MCMC). The main principle is to greedily minimize the kernelized Stein discrepancy (KSD), which only requires the gradient of the log-target distribution, and is thus well-suited for Bayesian inference. The main advantages of Stein thinning are the automatic remove of the burn-in period, the correction of the bias introduced by recent MCMC algorithms, and the asymptotic properties of convergence towards the target distribution. Nevertheless, Stein thinning suffers from several empirical pathologies, which may result in poor approximations, as observed in the literature. In this article, we conduct a theoretical analysis of these pathologies, to clearly identify the mechanisms at stake, and suggest improved strategies. Then, we introduce the regularized Stein thinning algorithm to alleviate the identified pathologies.


Reviews: Stein Variational Gradient Descent as Moment Matching

Neural Information Processing Systems

In "Stein Variational Gradient Descent as Moment Matching," the authors first introduce the algorithm known as Stein Variational Gradient Descent (SVGD). While some work has been done trying to provide a theoretical analysis of this method, the consistency of SVGD is largely still open for finite sizes of n. By studying the fixed point solution to SVGD, they show there are a set of functions for which the the fixed point solution perfectly estimates their mean under the target distribution (they call this the Stein set of functions). They argue that using a polynomial kernel when the target is a Gaussian will force any fixed point solution of SVGD to exactly estimate the mean and covariance of the target distribution, assuming the SVGD solution points are full rank. The major contribution of this paper is that by studying the properties of finite dimensional kernels, they are able to employ random Fourier features to provide a theoretical analysis of the fixed points for these "randomized" kernels.